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Abstract — Detection of defective members of large popula- 
tions has been widely studied in the statistics community under 
the name "group testing", a problem which dates back to World 
War II when it was suggested for syphilis screening. There, 
the main interest is to identify a small number of infected 
people among a large population using collective samples. In 
viral epidemics, one way to acquire collective samples is by 
sending agents inside the population. While in classical group 
testing, it is assumed that the sampling procedure is fully known 
to the reconstruction algorithm, in this work we assume that 
the decoder possesses only partial knowledge about the sampling 
process. This assumption is justified by observing the fact that in 
a viral sickness, there is a chance that an agent remains healthy 
despite having contact with an infected person. Therefore, the 
reconstruction method has to cope with two different types of 
uncertainty; namely, identification of the infected population 
and the partially unknown sampling procedure. 

In this work, by using a natural probabilistic model for 
"viral infections", we design non-adaptive sampling procedures 
that allow successful identification of the infected population 
with overwhelming probability 1 o(l). We propose both 
probabilistic and explicit design procedures that require a 
"small" number of agents to single out the infected individuals. 
More precisely, for a contamination probability p, the number 
of agents required by the probabilistic and explicit designs 
for identification of up to k infected members is bounded by 
m = 0(fc^(logn)/p^) and m = 0(fc^(log^ n)/p^), respectively. 
In both cases, a simple decoder is able to successfully identify 
the infected population in time 0{mn). 

I. Introduction 

Suppose that we have a large population in which only 
a small number of people are infected by a certain viral 
disease (e.g., one may think of a flu epidemic), and that 
we wish to identify the infected ones. By testing each 
member of the population individually, we can expect the 
cost of the testing procedure to be large. If we could 
instead pool a number of samples together and then test 
the pool collectively, the number of tests required might be 
reduced. This is the main conceptual idea behind the classical 
group testing problem which was introduced by Dorfman 
[1] and later found applications in variety of areas. A few 
examples of such applications include testing for defective 
items (e.g., defective light bulbs or resistors) as a part of 
industrial quality assurance [2], DNA sequencing [3] and 
DNA library screening in molecular biology (see, e.g., [4], 
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67322 and 200020-103729. 
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Fig. 1. Collective sampling using agents. ^ symbols represent infected 
people among healthy people indicated by • symbols. The dashed lines 
show the individuals contacted by the agents. 



[5], [6], [7], [8] and the references therein), multiaccess 
communication [9], data compression [10], pattern matching 
[11], streaming algorithms [12], software testing [13], and 
compressed sensing [14]. See the books by Du and Hwang 
[15], [16] for a detailed account of the major developments 
in this area. 

One way to acquire collective samples is by sending agents 
inside the population whose task is to contact people (see 
Fig. [T]i- The agents can also be chosen as ATM machines, 
cashiers in supermarkets, among other possibilities. Once an 
agent has made contact with an "infected" person, there is a 
chance that he gets infected, too. By the end of the testing 
procedure, all agents are gathered and tested for the disease. 
Here, we assume that each agent has a log file by which 
one can figure out with whom he has made contact. One 
way to implement the log in practice is to use identifiable 
devices (for instance, cell phones) that can exchange unique 
identifiers when in range. This way, one can for instance ask 
an agent to randomly meet a certain number of people in the 
population and at the end learn which individuals have been 
met from the data gathered by the device that is carried by the 
agent. Note that, even if an agent contacts an infected person, 
he will not get infected with certainty. Hence, it may well 
happen that an agent's result is negative (meaning that he is 
not infected) despite a contact with some infected person. We 
will assume that when an agent gets infected, the resulting 
infection will not be contagious, i.e., an agent never infects 



other people. Our ultimate goal is to identify the infected 
persons with the use of a simple recovery algorithm, based 
on the test resultfl We remark that this model is applicable 
in certain scenarios different from what we described as 
well. For instance, in classical group testing, "dilution" of 
a sample might make some of the items present in a pool 
ineffective. The effect of dilution can be captured by the 
notion of contamination in our model. 

It is important to notice the difference between this setup 
and the classical group testing where each contact with an 
infected person will infect the agent with certainty. In other 
words, in the classical group testing the decoder fully knows 
the sampling procedure, whereas in our setup, it has only 
uncertain knowledge. Hence, in this scenario the decoder has 
to cope simultaneously with two sources of uncertainty, the 
unknown group of infected people and the partially unknown 
(or stochastic) sampling procedure. 

The collective sampling can be done in adaptive or non- 
adaptive fashions. In the former, samplings are carried out 
one at a time, possibly depending the outcomes of the 
previous agents. However, in the latter, the sampling strategy 
is specified and fixed before seeing the the test outcome 
for any of the agents. In this paper we only focus on non- 
adaptive sampling methods, which is more favorable for 
applications. 

The idea behind our setup is mathematically related to 
compressed sensing [17], [18]. Nevertheless, they differ 
in a significant way: In compressed sensing, the samples 
are gathered as linear observations of a sparse real signal 
and typically tools such as linear programming methods 
is applied for the reconstruction. To do so, it is assumed 
that the decoder knows the measurement matrix a priori. 
However, this is not the case in our setup. In other words, 
using the language of compressed sensing, in our scenario the 
measurement matrix might be "noisy" and is not precisely 
known to the decoder. As it turns out, by using a sufficient 
number of agents this issue can be resolved. 

II. Problem Setting and Summary of the Results 

To model the problem, we enumerate the individuals from 
1 to n and the agents from 1 to m. Let the non-zero 
entries of x :— {xi, X2, ■ ■ ■ , Xn) G ^2 indicate the infected 
individuals within the population. Moreover, we assume that 
£c is a fc-sparse vector, i.e., it has at most k nonzero entries 
(corresponding to the infected population). We refer to the 
support set of x as the the set which contains positions of 
the nonzero entries. 

As typical in the literature of group testing and compressed 
sensing, to model the non-adaptive samplings done by the 
agents, we introduce an m x n boolean contact matrix M'^ 
where we set M^j to one if and only if the ith agent contacts 
the jth person. As we see, the matrix Af^ only shows 
which agents contact which persons. In particular it does 
not indicate whether the agents eventually get affected by the 

' In this work we focus on the exact reconstruction of the set of infected 
individuals in the worst case (i.e., regardless of the choice of this set). 



contact. Let us assume that at each contact with a sick person 
an agent gets infected independently with probability p (a 
fixed parameter that we call the contamination probability). 
Therefore, the real sampling matrix can be thought of 
as a variation of M'^ in the following way: 

> Each non-zero entry of M'^ is flipped to independently 

with probability 1 — p; 
• The resulting matrix is used just as in classical 

group testing to produce the outcome vector y e F™, 

y = M'x, (1) 

where the arithmetic is boolean (i.e., multipUcation with 
the logical AND and addition with the logical OR). 

The contact matrix M'^, the outcome vector y, the number 
of non-zero entries k, and the contamination probability p 
are known to the decoder, whereas the sampling matrix 
(under which the collective samples are taken) and the input 
vector X are unknown. The task of the decoder is to identify 
the k non-zero entries of x based on the known parameters. 

Example 1: As a toy example, consider a population with 
6 members where only two of them (persons 3 and 4) are 
infected. We send three agents to the population, where the 
first one contacts persons 1,3,5, the second one contacts 
persons 2,4,6, and the third one contacts persons 2,3,5,6. 
Therefore, the contact matrix and the input vector have the 
following form 

a; = ( 1 1 
supp(a;) = {3,4}, 

/ 1 1 1 \ 
M= = 10 10 1. 

\ 1 1 1 1 / 

Let us assume that only the second agent gets infected. This 
means that the outcome vector is 

y = ( 1 )^. 

As we can observe, there are many possibilities for the 
sampling matrix, all of the following form: 

/ ? ? ? \ 
M= = ? ? ? , 
\0??0??/ 

where the question marks are with probability 1 — p 
and 1 with probability p. It is the decoder's task to figure 
out which combinations make sense based on the outcome 
vector. For example, the following matrices and input vectors 
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More formally, the goal of our scenario is two-fold: 

1) Designing the contact matrix Af^ so that it allows 
unique reconstruction of any sparse input x from 
outcome y with overwhelming probability (1 — o(l)) 
over the randomness of the sampling matrix M^. 

2) Proposing a recovery algorithm with low computa- 
tional complexity. 

In this work, we present a probabilistic and a deterministic 
approach for designing contact matrices suitable for our 
problem setting along with a simple decoding algorithm for 
reconstruction. Our approach is to first introduce a rather 
different setting for the problem that involves no randomness 
in the way the infection spreads out. Namely, in the new 
setting an adversary can arbitrarily decide whether a certain 
contact with an infected individual results in a contamination 
or not, and the only restriction on the adversary is on the 
total amount of contaminations being made. In this regard, 
the relationship between the adversarial variation of the 
problem and the original (stochastic) problem can be thought 
of akin to the one between the combinatorial problem of 
designing block codes with large minimum distances as 
opposed to designing codes for stochastic communication 
channels. The reason for introducing the adversarial problem 
is its combinatorial nature that allows us to use standard tools 
and techniques already developed in combinatorial group 
testing. Fortunately it turns out that solving the adversarial 
variation is sufficient for the original (stochastic) problem. 
We discuss this relationship and an efficient reconstruction 
algorithm in Section |lll] 

Our next task is to design contact matrices suitable for 
the adversarial (and thus, stochastic) problem. We extend 
two standard techniques from group testing to our setting. 
Namely, we give a probabilistic and an explicit construction 
of the contact matrix in Sections |IV] and |V] respectively. 
The probabilistic construction requires each agent to inde- 
pendently contact any individual with a certain well-chosen 
probability and ensures that the resulting data gathered at the 
end of the experiment can be used for correct identification of 
the infected population with overwhelming probability, pro- 
vided that the number of agents is sufficiently large. Namely, 
for contamination probability p, we require 0(fc^ (log n) /p^) 
agents, where k is the estimate on the size of the infected 
population. The explicit construction, on the other hand. 



precisely determines which agent should contact which in- 
dividual, and guarantees correct identification with certainty 
in the adversarial setting and with overwhelming probability 
(over the randomness of the contaminations) in the stochastic 
setting. This construction requires 0(fc^(log^ ri)/p^) agents 
which is inferior than what achieved by the probabilistic 
construction by a factor 0(log7i). 

We point out that, very recently, Atia and Saligrama [19] 
developed an information theoretic perspective applicable to 
a variety of group testing problems, including a "dilution 
model" which is closely related to what we consider in 
this work. Contrary to our combinatorial approach, they use 
information theoretic techniques to obtain bounds on the 
number of required measurements. Their bounds are with 
respect to random constructions and typical set decoding as 
the reconstruction method. Specifically, in our terminology 
with contamination probability p, they obtain an information 
theoretic upper bound of 0{k^\ogn/p^) on the number 
measurements, which is comparable to what we obtain in 
our probabilistic construction. 

Remark: As is customary in the standard group testing 
literature, we think of the spartsity fc as a parameter that is 
noticeably smaller than the population size n; for example, 
one may take k = 0['n}/^). Indeed, if k becomes comparable 
to n, there would be little point in using a group testing 
scheme and in practice, for large k it is generally more favor- 
able to perform trivial tests on the individuals. Nevertheless 
it is easy to observe that our probabilistic scheme can in 
general achieve m — 0{k^ \og{n/k)/p^), but we ignore such 
refinements for the sake of clarity. 

III. Adversarial Setting 

The problem described in Section has a stochastic 
nature, in that the sampling matrix is obtained from the 
contact matrix through a random process. In this section we 
introduce an adversarial variation of the problem that we find 
more convenient to work with. 

In the adversarial variation of the problem, the sampling 
matrix is obtained from the contact matrix by flipping up to 
e arbitrary entries to on the support (i.e., the set of nonzero 
entries) of each column of Af^, for some error parameter e. 
The goal is to be able to exactly identify the sparse vector 
despite the perturbation of the contact matrix and regardless 
of the choice of the altered entries. Note that the classical 
group testing problem corresponds to the special case e = 0. 
Thus the only difference between the adversarial problem and 
the stochastic one is that in the former problem the flipped 
entries of the contact matrix are chosen arbitrarily (as long 
as there are not too many flips) while in the latter they are 
chosen according to a specific random process. 

It turns out that the combinatorial tool required for solving 
the adversarial problem is precisely the notion of disjunct 
matrices that is well studied in the group testing literature. 
The formal definition is as follows. 

Definition 2: A boolean matrix M with n columns 
Mi,...,M„ is called (fc, e)-disjunct if, for every subset 



S C [n] of the columns with \S\ < k, and every i ^ S, we 
have 



supp(M,)\ I U supp(M,) 



where supp{Mi) denotes the support of the column Mi. 

The following proposition shows a one-to-one correspon- 
dence between contact matrices suitable for the adversarial 
problem and disjunct matrices: 

Proposition 3: Let M be a {k, e)-disjunct matrix. Then 
taking M as the contact matrix solves the adversarial prob- 
lem for fc-sparse vectors with error parameter e. Conversely, 
any matrix that solves the adversarial problem must be 
{k — 1, e)-disjunct. 

Proof: Let M be a {k, e)-disjunct matrix and con- 
sider /c-sparse vectors x, x' supported on different subsets 
S, S' C [n]. Take an element i £ S' which is not in S. By 
Definition |2l we know that the column Mi has more than e 
entries on its support that are not present in the support of 
any Mj,j E S. Therefore, even after e bit flips in Mi, at 
least one entry in its support remains that is not present in 
the measurement outcome of x', and this makes x and x' 
distinguishable. 

For the reverse direction, suppose that M is not (fc — 
1, e) -disjunct and take any i G [n] and S C [n] with \S\ < 
k — 1, i ^ S which demonstrate a counterexample for M 
being {k — 1, e)-disjunct. Consider /c-sparse vectors x and 
x' supported on S and S U {i}, respectively. An adversary 
can flip up to e bits on the support of Mi from 1 to 0, leave 
the rest of M unchanged, and ensure that the measurement 
outcomes for x and x' coincide. Thus M is not suitable for 
the adversarial problem. ■ 

Of course, posing the adversarial problem is only interest- 
ing if it helps in solving the original stochastic problem from 
which it originates. Below we show that this is indeed the 
case; and in fact the task of solving the stochastic problem 
reduces to that of the adversarial problem; and thus after this 
point it suffices to focus on the adversarial problem. 

Proposition 4: Suppose that M is an to x rt contact matrix 
that solves the adversarial problem for fc-sparse vectors with 
some error parameter e. Moreover, suppose that the weight 
of each column of M is between (1 — S)qm and qm, for 
a parameter q G (0, 1) and a constant S € (0, 1), and that 
e = {l—p){l + 6)qm, for a constant^ G (0, 1). Then M can 
be used for the stochastic problem with contamination prob- 
ability p, and achieves error probability at most n2^^^^'^"^\ 
where probability is taken over the randomness of sampling 
(and the constant behind ri( ) depends on p and S). 

Proof: Take any column Mi of M, and let Wi be 
its weight. After the bit flips, we expect the weight of the 
column to reduce to pwi. Moreover, by Chernoff bounds, 
the probability that (for "small" S) the amount of bit flips 
exceeds (1 — p)wi{l + S) is at most 



Thus, by a union bound, the probability that the amount of 
bit flips at some column is not tolerable by M is at most 
n2-"(«"). ■ 

Remark: Note that, as we mentioned earlier, the adversarial 
problem is stronger than classical group testing, and thus, any 
lower bound on the number of measurements required for 
classical group testing applies to our problem as well. It is 
known that any measurement matrix that avoids confusion 
in standard group testing requires at least r2(fc^ logj, n) 
measurements [20], [21], [22]. Thus we must necessarily 
have TO = n{k^ log^, n) as well, and this upper bounds 
the error probability given by Proposition |4] by at most 

A. Decoding 

Suppose that the contact matrix M^ is (fc, e)-disjunct. 
Therefore, by Proposition |3] it can combinatorially distin- 
guish between fc-sparse vectors in the adversarial setting with 
error parameter e. In this work we consider a very simple 
decoder that works as follows. 

Distance decoder: For any column Ci of the contact matrix 
M'^, the decoder verifies the following: 



|supp(cj) \ supp(y)| < e, 



(2) 



exp{-S^{l - p)wi/'i) < 

exp{-d^{l-d){l 



■p)qm/A) = 2-^(9™). 



where y is the vector consisting of the measurement out- 
comes. The coordinate Xi is decided to be nonzero if and 
only if the inequality holds. 

Lemma 5: The distance decoder correctly identifies the 
correct support of any fc-sparse vector (with the above 
disjunctness assumption on M). 

Proof: Let a; be a fc-sparse vector and S := supp(a;), 
15*1 < k, and Af^ denote the corresponding set of columns 
in the sampling matrix. Obviously all the columns in Afg 
satisfy (|2]l (as no column is perturbed in more than e 
positions) and thus the reconstruction includes the support 
of X (this is true regardless of the disjunctness property 
of M). Now let the vector y be the bitwise OR of the 
columns in 7W^ so that supp(i/) C supp(y), and assume 
that there is a column c of M"^ outside S that satisfies (|2]i. 
Thus we will have |supp(c) \supp(y)| < e, and this violates 
the assumption that M'^ is (fc, e)-disjunct. Therefore, the 
distance decoder outputs the exact support of x. ■ 

IV. Probabilistic Design 

In light of Propositions [3] and HI we know that in order to 
solve the stochastic problem with contamination probability 
p and sparsity fc, it is sufficient to construct a (fc, e)-disjunct 
matrix for an appropriate choice of e. In this section, we 
consider a probabilistic construction for M'^, where each 
entry of M'^ is set to 1 independently with probability 
q := a/k, for a parameter a to be determined later, and 
with probability 1 — q. We will use standard arguments to 
show that, if the number of measurements to is sufficiently 
large, then the resulting matrix M'^ is suitable with all but 
a vanishing probability. 



Let (5 > be an arbitrary (and small) constant. Using 
Chernoff bounds, we see that if m 3> log n (which will be 
the case), with probability 1 — o(l) no column of M'^ will 
have weight greater than <7(1 + S)m or less than q{l~ S^)m. 
Thus in order to be able to apply Proposition H] it suffices to 
set e := {l—p){l+35)qm as this value is larger than the error 
parameter (1 — p)(l + 6)^qm required by the proposition. 

Lemma 6: For the above choices of the parameters q 
and e, the probabilistic construction obtains a (fc, e)-disjunct 
matrix with probability 1 — o(l) using m = O {k^ {log n)/p^) 
measurements. 

Proof: Consider any set 5 of fc columns of Af^, and 
any column outside these, say the ith column where i ^ S. 
First we upper bound the probability of a failure for this 
choice of S and i, i.e., the probability that the number of the 
positions at the ith column corresponding to which all the 
columns in S have zeros is at most e. Clearly if this event 
happens the (fc, e)-disjunct property is violated. On the other 
hand, if for no choice of S and i a failure happens the matrix 
is indeed (fc, e)-disjunct. 

Now we compute the failure probability pf for a fixed S 
and i. A row is good if at that row the ith column has a 1 
but all the columns in S have zeros. For a particular row, the 
probability that the row is good is q{l — q)'^. Then failure 
corresponds to the event that the number of good rows is 
at most e. The distribution on the number of good rows is 
binomial with mean /i = q{l — q)^ra. By a Chernoff bound, 
the failure probability is at most 

Pf < exp(-(^-e)V(2Ai)) 
= exp(— mq((l — q)^ — 

(l-p)(l + 3<5))V(2(l-g)'=)) 
< exp(-m(7(l/3" - (1 - p){\ + ?,5)f 12^-°') 

where the last inequality is due to the fact that [1 — q)^ = 
(1 — a/k)^ is always between 1/3" and 1/2". Let 7 :— 
(1/3" - (1 -p)(l + 3J))V2i-". Note that by choosing the 
parameters a and 5 as sufficiently small constants, 7 can be 
made arbitrarily close to p^/2. 

Now if we apply a union bound over all possible choices 
of S and i, the probability of coming up with a bad 
choice of would be at most cxp(— 771(77). This 
probability vanishes so long as m, > fc^ log(7i/fc)/(a7) — 
0(fc2(log77)/p2). ■ 

Along with Propositions |3] and |4] the result above imme- 
diately gives the following: 

Theorem 7: The probabilistic design for construction of 
an 777 X 77 contact matrix M'^ achieves m = 0{k'^ {\ogn) / p^) 
measurements and error probability at most rT^'^^/ — 
0(1) for the stochastic problem using distance decoder as the 
reconstruction method. 

The probabilistic construction results in a rather sparse 
matrix, namely, one with density 0(1/ k) that decays with 
the sparsity parameter k. Below we show that sparsity is 
necessary condition for the construction to work: 

Lemma 8: Let M he an m x n boolean random matrix, 
where 777 — 0{k^ logn) for an integer fc > 0, which is 



constructed by setting each entry independently to 1 with 
probability q. Then either q — 0{\ogk/k) or otherwise 
the probability that M is {k, e)-disjunct (for any e > 0) 
approaches to zero as n grows. 

Proof: Suppose that M is an x 77 matrix that is 
(fc, e)-disjunct. Observe that, for any integer t e (0, fc), if we 
remove any t columns of M and all the rows on the support 
of those columns, the matrix must remain {k — t, e)-disjunct. 
This is because any counterexample for the modified matrix 
being {k-^t, e) -disjunct can be extended to a counterexample 
for M being (fc, e)-disjunct by adding the removed columns 
to its support. 

Now consider any t columns of Af , and denote by mo the 
number of rows of M at which the entries corresponding to 
the chosen columns are all zeros. The expected value of 7779 
is (1 — qYni. Moreover, for every 6 > we have 

Pr[777o > (1 + 6){l - qYm] < exp(-(52(i _ qYm/A) (3) 

by a Chernoff bound. 

Let to be the largest integer for which {l + S){l~qY°m > 
log 77. If to < fc — 1, we let t := 1 + to above, and this makes 
the right hand side of Q upper bounded by o(l). So with 
probability 1 — o(l), the chosen t columns of M will keep 
mo at most (1 + 6){1 ~ qfm, and removing those columns 
and 7r7o rows on their union leaves the matrix (fc — to ^ 1, e)- 
disjunct, which obviously requires at least log 77 rows (as 
even a (1, 0)-disjunct matrix needs so many rows). Therefore, 
we must have 

(1 + S){1 - qfm > log 77 

or otherwise (with overwhelming probability) M will not be 
(fc, e)-disjunct. But the latter inequality is not satisfied by the 
assumption on to. So if to < fc — 1, little chance remains for 
M to be (fc, e)-disjunct. Now consider the case to > k — 1. 
By a similar argument as above, we must have 

(l + (5)(l-g)''777 >l0g77 

or otherwise the matrix will not be (fc, e)-disjunct with 
overwhelming probability. The above inequality implies that 
we must have 

log(777(l -f(5)/l0g77) 
fc 

which, for m — 0(k^ logn) gives q — 0(logfc/fc). ■ 

V. Explicit Design 

In the previous section we showed how a random construc- 
tion of the contact matrix achieves the desired properties for 
the adversarial (and thus, stochastic) model that we consider 
in this work. However, in principle an unfortunate choice 
of the contact matrix might fail to be of use (for example, 
it is possible though very unlikely that the contact matrix 
turns out to be all zeros) and thus it is of interest to have an 
explicit and deterministic construction of the contact matrix 
that is guaranteed to work. 

In this section, we demonstrate how a classical construc- 
tion of superimposed codes due to Kautz and Singleton [23] 



can be extended to our setting by a careful choice of the 
parameters. This is given by the following theorem. 

Theorem 9: There is an explicit construction for an to x 
n contact matrix that is guaranteed to be suitable for 
the stochastic problem with contamination probability p and 
sparsity parameter k, and achieves to = 0{k^(\o^ n) /p^). 

Proof: Let to be an even power of a prime, and 
n' := ypm. Consider a Reed-Solomon code of length n' 
and dimension k' over an alphabet of size n'. The contact 
matrix M."^ is designed to have n'^ columns, one for each 
codeword. Consider a mapping Lp: F„' that maps 

each element of F„' to a unique canonical basis vector of 
length n'; e.g., (1, 0, 0, ... , 0)^, 1 (0, 1, 0, ... , 0)^, 
etc. The column corresponding to a codeword c is set to the 
binary vector of length to that is obtained by replacing each 
entry c; of c by <p(ci), blowing up the length of c from n' 
to 

Note that the number of columns of M.'^ is n := n'^ — 
and each column has weight exactly n' ~ m/n'. 
Moreover, the support of any two distinct columns intersect 
at less than k' entries, because of the fact that the underlying 
Reed-Solomon code is an MDS code and has minimum 
distance n' — fc' + 1. Thus in order to ensure that M"^ is 
(fc, e)-disjunct, it suffices to have n' — kk' > e (so that no 
set of k columns of M'^ can cover too many entries of any 
column outside the set), or equivalently, 

— 2fc(log n/ logTO) > e. (4) 

By Proposition m we need to set e := (1 — + 6)m/n' 
for an arbitrary constant 6 > Q. Thus in order to satisfy (|4|i, 
it suffices to have ^/m{l — (1 — p){l + 5)) > 2fclogn, 
which gives to > 4fc^log^n/(l — (1 — p){l + S))"^. As 
6 can be chosen arbitrarily small, the denominator can be 
made arbitrarily close to p^ and thus we conclude that this 
construction achieves to = 0{k^ log^ n/p^) measurements, 
which is essentially larger than the amount achieved by the 
probabilistic construction by a factor O(logn). ■ 
Observe that, unlike the probabilistic construction of the 
previous section, the explicit construction above guarantees a 
correct reconstruction in the adversarial setting (where up to 
a 1—p fraction of the entries on the support of each column of 
the contact matrix might be flipped to zero). Moreover, in the 
original stochastic setting with contamination probability p, 
a single matrix given by the explicit construction guarantees 
correct reconstruction with overwhelming probability, where 
the probability is only over the randomness of the testing 
procedure. This is in contrast with the probabilistic con- 
struction where the failure probability is small, but originates 
from two sources; namely, unfortunate outcome of the testing 



procedure as well as unfortunate choice of the contact matrix 
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